AI Security
AIディフェンス研究所
Security camp感想
AIJack
PySyft
Generative AI and Large Language Models for Cyber Security: All Insights You Need
Security of LLM Information Hub
TrustLLM: Trustworthiness in Large Language Models
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Golden Gate Claude
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Improving Alignment and Robustness with Circuit Breakers
OpenAI APIで、問題のある発言を検出するmoderationモデルを試してみた
ChatGPT "DAN" (and other "Jailbreaks")
Universal and Transferable Adversarial Attacks on Aligned Language Models
NeMo-Guardrails
LLMにおけるガードレールについて
【2024.9.9 AIアライメントネットワーク設立記念シンポジウム】#1「ALIGNの挑戦」髙橋恒一(ALIGN代表理事)
https://www.youtube.com/watch?v=_13ORbYifbU&t=910s
Singular Learning TheoryとAI Alignmentが結びつくのが面白い。そろそろ寝なければと思っていたのに、眠れなくなった。アラインメントとFree Energyが結びつき、脳のことにまで発想が広がってきた。
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
多文化・他言語対応の安全な大規模言語モデルの構築を目指して
https://www.youtube.com/watch?v=NLaayZ4v6Ag
LLMのアウトプットをバリデーションする関数が集まるGuardrails Hubを試す
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
SLM as Guardian: Pioneering AI Safety with Small Language Models
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
Hacker Panel: What Hackers Can Tell You About AI Security
https://www.youtube.com/watch?v=eoXouUA1raQ
LLMjackingがDeepSeekを標的にする
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
AIセーフティに関するレッドチーミング手法ガイド Guide to Red Teaming Methodology on AI Safety
Proposed Frontier Model Transparency Framework
Introducing the SAIL Framework: A Practical Guide to Secure AI Systems
AIにおけるコンタミネーションと電磁波やEMPの影響による誤作動についての検討
GRAY SWAN
Myllmdoc
Building a secure code review agent
CAI
How Model Armor can help protect your AI apps from prompt injections and jailbreaks
The Thinking Machines Tinker API is good news for AI control and security
Practical LLM Security Advice from the NVIDIA AI Red Team